MCG-ICT at MediaEval 2016 Verifying Tweets from both Text and Visual Content

نویسندگان

  • Juan Cao
  • Zhiwei Jin
  • Yazi Zhang
چکیده

The Verifying Multimedia Use Task aims to automatically detect manipulated and fake web multimedia content. We have two important improvements this year: On the one hand, considering that the prediction based on a short tweet is unreliable, we propose a topic-level credibility prediction framework. This framework exploits the internal relations of tweets belonging to same topic. Besides, we enhance the prediction precision of the framework by sampling topics and exploring topic-level features. On the other hand, motivated by the idea that manually edited or low-quality videos tend to be fake, we reference the handbook[1] about detecting manual editions and build a decision tree on videos. 1. PROPOSED APPROACH We treat the task as a binary classification problem: real or fake. Generally, a tweet contains two kinds of content: text content and visual content. So, we build two classification models respectively: for text content, the task pays more attention to small events than breaking news this year. More than 59% events contain less than 10 tweets and 95% are less than 50 tweets. Compared with last year’s 42.5 tweets per event, the small event verification is more challenging. We propose a topic-level verification framework, and improve its performance by exploring topic-level features and sampling on topics. For visual content, we reference the handbook[1] about detecting manual editions and build a decision tree on videos. However, the task focuses only on detecting fake tweets while we put efforts to identify on both categories. Finally, we propose a method performing relatively well on both real and fake tweets. More details about the task can be found in [2]. 1.1 Text Analysis Approach As illustrated in Figure 1, the framework of our text analysis approach consists of three parts: a message-level classifier, a topic-level classifier and a fusing part. Like many traditional text analysis methods, we firstly build a messagelevel classifier based on the given content features and user features. However, a tweet is very short (no more than 140 words) and its meaning is incomplete. The credibility prediction on message-level is unreliable. We observe that each tweet contains videos/images in Copyright is held by the author/owner(s). MediaEval 2016 Workshop, Oct. 20-21, 2016, Hilversum, Netherlands. Figure 1: The framework of the text analysis approach our data, and tweets containing the same videos/images are rather independent but have strong relations with each other. More specifically, they attend to have same credibilities. In order to exploit their inner-relations, we take the tweets of the same video/image as a topic, and build a topic-level classifier. Compared with an independent tweet, a topic can maintain principal information and also eliminate random noise. As the primary contribution in the text analysis, the topic-level improves the F1 value of 4% on fake tweets and more than 8% on real tweets. Two main innovations of the topic-level part are as follows: Topic-level Features Extracting: For each topic, we compute the average of its tweets’ features as its features. Besides, we propose several statistic features which are listed in Table 1. Combining the two kinds of features above, we finally get the whole topic-level features. It turns out that these statistic features are quite effective for identifying fake tweets. They boost the topic-level classifier’s F1 value on fake tweets by more than 14%. Topics Sampling: In our dataset, More than 59% topics contain less than 10 tweets and 95% are less than 50 tweets, which means there are quite a few small topics. To remove the noise brought by these small topics, we sample topics with high confidence in a 10-fold cross validation process. The sampling keeps the balance between fake and real Table 1: Topic Layer New Statistic Features

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MCG-ICT at MediaEval 2015: Verifying Multimedia Use with a Two-Level Classification Model

The Verifying Multimedia Use task aims to detect misuse of online multimedia content and verify them as real or fake. This is a highly challenging problem because of strong variations among tweets from different events. Traditional approaches train the classifier at message level, which ignores inter-message relations. We propose a two-level classification model to exploit the information that ...

متن کامل

Verifying Multimedia Use at MediaEval 2016

This paper provides an overview of the Verifying Multimedia Use task that takes places as part of the 2016 MediaEval Benchmark. The task motivates the development of automated techniques for detecting manipulated and misleading use of web multimedia content. Splicing, tampering and reposting videos and images are examples of manipulation that are part of the task definition. For the 2016 editio...

متن کامل

Verifying Multimedia Use at MediaEval 2015

This paper provides an overview of the Verifying Multimedia Use task that takes places as part of the 2015 MediaEval Benchmark. The task deals with the automatic detection of manipulation and misuse of Web multimedia content. Its aim is to lay the basis for a future generation of tools that could assist media professionals in the process of verification. Examples of manipulation include malicio...

متن کامل

MediaEval 2016 Multimedia Benchmarking Initiative

MediaEval is a multimedia benchmarking initiative which seeks to evaluate new algorithms for multimedia access and retrieval. MediaEval emphasizes the "multi" in multimedia, including tasks combining various facet combinations of speech, audio, visual content, tags, users, and context. MediaEval innovates new tasks and techniques focusing on the human and social aspects of multimedia content in...

متن کامل

MediaEval 2016: A Multimodal System for the Verifying Multimedia Use Task

This paper presents a multi-modal hoax detection system composed of text, source, and image analysis. As hoax can be very diverse, we want to analyze several modalities to better detect them. This system is applied in the context of the Verifying Multimedia Use task of MediaEval 2016. Experiments show the performance of each separated modality as well as their combination.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016